Extracting Linguistic Speech Patterns of Japanese Fictional Characters using Subword Units

نویسندگان

چکیده

This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to broken expressions utterance endings not listed in dictionary, which often appears lines of To overcome this challenge, we propose segmenting characters using subword units were proposed mainly for deep learning, extracting frequently occurring strings obtain their utterances. We weighted by TF/IDF according gender, age, each character show specific feature. Additionally, a classification experiment shows model outperformed conventional method.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards an Entertaining Natural Language Generation System: Linguistic Peculiarities of Japanese Fictional Characters

One of the key ways of making dialogue agents more attractive as conversation partners is characterization, as it makes the agents more friendly, humanlike, and entertaining. To build such characters, utterances suitable for the characters are usually manually prepared. However, it is expensive to do this for a large number of utterances. To reduce this cost, we are developing a natural languag...

متن کامل

Creating Large Subword Units for Speech Recogntion

This paper deals with the choice of suitable subword units (SWU) for a HMM based speech recognition system. Using demisyllables (including phonemes) as base units, an inventory of domain-specific larger sized subword units, so-called macro-demisyllables (MDS), is created. A quality measure for the automatic decomposition of all single words into subword units is presented which takes into accou...

متن کامل

Creating large subword units for speech recognition

This paper deals with the choice of suitable subword units (SWU) for a HMM based speech recognition system. Using demisyllables (including phonemes) as base units, an inventory of domain-specific larger sized subword units, so-called macro-demisyllables (MDS), is created. A quality measure for the automatic decomposition of all single words into subword units is presented which takes into accou...

متن کامل

Topic Spotting Using Subword Units

Geh ort zum Antragsabschnitt: 4.8 Das diesem Bericht zugrundeliegende Forschungsvorhaben wurde mit Mitteln des Bundesministers f ur Bildung, Wissenschaft, Forschung und Technologie unter dem F orderkennzeichen 01 IV 701 K/5 gef ordert. Die Verantwortung f ur den Inhalt dieser Arbeit liegt bei den Autoren. Abstract In this paper we present a new approach for topic spotting based on subword units...

متن کامل

Unravelling Names of Fictional Characters

In this paper we explore the correlation between the sound of words and their meaning, by testing if the polarity (‘good guy’ or ‘bad guy’) of a character’s role in a work of fiction can be predicted by the name of the character in the absence of any other context. Our approach is based on phonological and other features proposed in prior theoretical studies of fictional names. These features a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International journal on natural language computing

سال: 2022

ISSN: ['2278-1307', '2319-4111']

DOI: https://doi.org/10.5121/ijnlc.2022.11101